Convex Optimization Procedure for Clustering: Theoretical Revisit
نویسندگان
چکیده
In this paper, we present theoretical analysis of SON – a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in [8, 10, 11, 17]. In particular, we show if the samples are drawn from two cubes, each being one cluster, then SON can provably identify the cluster membership provided that the distance between the two cubes is larger than a threshold which (linearly) depends on the size of the cube and the ratio of numbers of samples in each cluster. To the best of our knowledge, this paper is the first to provide a rigorous analysis to understand why and when SON works. We believe this may provide important insights to develop novel convex optimization based algorithms for clustering.
منابع مشابه
Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers
Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering in which there is no need to be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...
متن کاملA Revisit to Support Vector Data Description
Support vector data description (SVDD) is a useful method for outlier detection and has been applied to a variety of applications. However, in the existing optimization procedure of SVDD, there are some issues which may lead to improper usage of SVDD. Some of the issues might already be known in practice, but the theoretical discussion, justification and correction are still lacking. Given the ...
متن کاملInformation Theoretical Clustering via Semidefinite Programming
We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique of semidefinite programming (SDP) relaxation to obtain a convex SDP problem. We show how the sol...
متن کاملRobust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes
Community detection, which aims to cluster N nodes in a given graph into r distinct groups based on the observed undirected edges, is an important problem in network data analysis. In this paper, the popular stochastic block model (SBM) is extended to the generalized stochastic block model (GSBM) that allows for adversarial outlier nodes, which are connected with the other nodes in the graph in...
متن کاملRobust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes1 by T. Tony Cai
Community detection, which aims to cluster N nodes in a given graph into r distinct groups based on the observed undirected edges, is an important problem in network data analysis. In this paper, the popular stochastic block model (SBM) is extended to the generalized stochastic block model (GSBM) that allows for adversarial outlier nodes, which are connected with the other nodes in the graph in...
متن کامل